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According to the Loss Distribution Approach, the operational risk of a bank is determined as the 99.9% quantile of 
the respective loss distribution, covering unexpected severe events. The 99.9% quantile can be considered a tail event. As 
supported by the Pickands-Balkema-de Haan Theorem, tail events exceeding some high threshold are usually modeled by a 
Generalized Pareto Distribution (GPD). Estimation ofGPD tail quantiles is not a trivial task, in particular if one takes into 
account the heavy tails of this distribution, the possibility of singular outliers, and, moreover, the fact that data is usually 
pooled among several sources. 

In such situations, robust methods may provide stable estimates when classical methods already fail. 

In this paper, optimally- robust procedures MBRE, OMSE, RMXE are introduced to the application domain of operational 
risk. We apply these procedures to parameter estimation of a GPD at data from Algorithmics Inc. To better understand these 
results, we provide supportive diagnostic plots adjusted for this context: influence plots, outlyingness plots, and QQ plots 
with robust confidence bands. 

Keywords: operational risk, Generalized Pareto Distribution, robust estimation, diagnostic plot 



Introduction 



Operational risk according to lBasel II ( 20061) covers risks of loss resulting from inadequate or failed internal processes, people 



or systems, or from external events. Neither is this risk new, nor is the need t o measure it. Still, it remains a challenging issue, 
in particular as far as very large operational losses are concerned, compare iDe Fon tnouvell e et al. (2006). This is reflected 



by the sizeable amount of media coverage we have seen in the last few years: rogue tradings, e.g., at Daiwa Bank (1984-95), 
Sumitomo Corp. (1986-96), Barings (1995), and Societe Geneiale (2006-2008), losses caused by the 9/11 terrorist attacks 
(2001), by the B. L. Madoff fraud (1980s-2008), by hurricane Katrina (1995), and by the recent earthquake in Japan (2011). 
Because of its impact, operational risk has been integrated into the Basel II framework of regulatory requirements. The focus 
of the present paper lies in (robust) quantification of the respective regulatory capital. 

One of the most challenging problems in this context is data — both as to quantity and as to quality. We notice that, 
fortunately for our economies, very large operational losses are observed rarely. Still, they have a tremendous effect. As 
a consequence, usually only some few observations will have an overwhelming impact on the computed regulatory capital. 
In addition, in a realistic modeling, taking into account possible model deviations, one cannot tell (without error) whether 
these events are singular outliers or reproducible and, hence, contribute valuable evidence for future losses. This question of 
relevance for future losses gets even more severe in the common and Basel-II-recommended practice of data pooling used to 
overcome the lack of historical (very large) loss data. 

Let us illustrate this: in quantifying risk, usually the tail behavior of the underlying distribution as expressed by tail 
quantiles (VaR) or truncated moments (CVaR) is crucial. Estimating these population quantiles by their empirical counterparts 
apparently is drastically prone to outliers: for the 99.9% quantile as typically used in operational risk, for 5000 observations, 
five irreproducible, extra-ordinarily large observations suffice to render this procedure completely meaningless. Passing to 
parametric models from extreme value theory per se is no remedy: maximum likelihood estimators (MLEs), optimal in this 
context, as a rule still attribute unbounded influence to some exposed observations, e.g., in our example, five outliers will still 
suffice to invalidate our conclusions. 

This is where robust statistics steps in. It aims at designing procedures which remain stable under minor model deviations; 
these deviations can stand for a minority of unpredictable outliers for which we cannot anticipate any model distribution. In 
our little illustration, robust statistics provides procedures bounding the influence of single observations. 



While Chernobai and Rachev (2006) have introduced general robust concepts to the domain of operational risk, the contri- 
bution of this paper is the application of optimally robust procedures to the quantification of operational risk, more precisely to 
data from the Algo DpDat a database of Algorithmics Inc. To this end, we focus on the part of operational risk caused by very 
large losses; i.e., on the tail distribution of the severity of operational losses, which leads us canonically to (optimally-robust) 
parametric estimation in generalized Pareto distributions. 

To this end, we present a comprehensive, self-contained survey of the shrinking neighborhood setup, in which the respec- 
tive optimally-robust estimators are derived. We do not repeat the respective derivation here, nor do we conduct a simulation 
study, comparing them to competitor estimators whi ch could support our findings also for finite sample sizes. Instead, we 
refer the reader to lRuckdeschel and Horbenkol ( 2010l) which contains all this. 



To judge the quality of our estimators when applied at real data sets (where fulfillment of the actual model assumptions is 
not clear), we contribute the translation of some diagnostic plots from robust statistics to this application domain which help 
us to understand and quantify the effect of our robustifications. 

The rest of the paper is organized as follows: the setup is presented in detail in Section [T] starting with the regulatory 
framework, describing the data situation, defining the mathematical setup in two parts as to the Loss Distribution Approach 
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and the generalized Pareto distribution to model the tail of the severity distribution. Section[2]continues with robustness: after 
an introduction of the central concepts of robust statistics we give a short summary of the literature on robustness approaches 
relevant for operational risk. Its longest subsection, Subsection 12.31 contains the announced self-contained summary of the 
shrinking neighborhood approach of robust statistics, in which we have obtained the optimally-robust estimators OMSE, 
MB RE, and RMXE used in the sequel. At the end of this section, we provide some implementation details. In Section[3] the 
data set from Algorithmics Inc. is discussed, together with the evaluation of the considered estimators at this data. Section [4] 
finally provides the diagnostic plots, which are again produced and explained for data from the Algo DpData database. A 
conclusion section at the end summarizes our main findings. 



1 Setup 

1.1 Regulatory Framework 

The most important international set of regulatory rules for financial instituti ons is given b y the Basel II framework for 
the International Convergence of Capital Measurement and Capital Standards dBasel ij d2006l) ). which in particular covers 
operational risk. 

The question to which Basel II applies is currently an important political one, but not the topic of this article. We only 
note that Basel II is binding for all financial services institutions in the European Union since 2007, but so far only covers the 
largest or most internationally active banks in the USA, a situation to be changed only in the upcoming Basel III fr a mewor k 
targeted for implementation in 2013. The results of a survey of the Basel Committee on Banking Supervision dBISl d2010ah ) 
indicate that 112 countries have implemented or are currently planning to implement Basel II. 

According to the Basel II framework, every bank has to estimate its operational risk and hold the appropriate regulatory 
capital to ensure its solvency and economic stability in case of foreseeable operational losses. While Basel II rules mainly 
address large, internationally active banks, their basic concepts should be applicable to banks of varying organizational and 
product line complexity. 

Basel II further recommends certain approaches for measuring the operational risk: the Basic Indicator Approach, the 
Standard Approach, and the Advanced Measurement Approaches (AMAs). The most sophisticated approach es are gathered in 



group AMA, which is advised for large international banks, but also subject to supervisory approval (§655, Basel III (2006)). 
for which a bank must meet certain qualitative and quantitative standards. The focus of this paper lies on the Loss Distribution 
Approach (LDA), which is a particular AMA to be discussed in Subsection ll.3l 

1.2 Data Situation 

LDA suggests measuring the operational risk based on historical data using information about the frequency and severity of 
earlier losses. To this end, according to Basel II, past operational losses of a bank should be documented in internal databases. 
These losses can roughly be divided into three types: expected (occasional an d modera t e), un expected (rare, but large), 



and catastrophic (very rare, extreme) losses (see Figure ll.21 i. where, according to dBase l II, 2006, §669 (b)), the regulatory 
capital is obtained as the sum of expected and unexpected losses. As mentioned in the introduction, unexpected losses are 
rare events, so the data situation is most difficult for this segment. 

As internal time series of this type are usually short and sparse, external data from losses of other banks documented 
in publicly available data pools (such as Algorithmics Algo DpData, SAS OpRisk Global Data) or databases of consortia 
of banks (e.g., O RX), as well as scenario-based data and internal control factors, should be included into estimation of the 
regulatory capital dBIS Inclusion of external data introduces new statistical challenges: 



First of all, there is the question of size and comparability; a priori it is far from clear whether losses of one bank could 
occur at all at another bank, and if so, at which scale. Sca ling of external data to a particular bank is a topic in its own right 
and has been dealt with in detail in Cop e and Labbil ( 2008 ). and Chernobai et al. (2011); we do not go into this in this article. 



We only note that even after a proper and robust scaling step, the robustness issue is not yet removed. 

In addition, we face a censoring problem, especially for external data, since data usually is only reported beyond a 
certain threshold. For internal reporting, this threshold is usually set relatively low, e.g., at 10,000 EUR according to a 
Basel II suggestion. For losses reported to the outside world, it varies from EUR 20,000 (ORX) to 1 million USD (Algo 
□pDat a). So in particular external loss samples will be biased, containing disproportionally high numbers of very large 
losses dDe Fon tnouvelle et al. I d2006h ). 



1.3 Mathematical Setup I: Loss Distribution Approach 

As indicated, this paper focuses on the Loss Distribution Approach (LDA). 
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Figure 1: Loss Categories 



In this approach, banks estimate the operational risk separately for each of some eight business lines and seven event 
types, giving a partition into a matrix, see Table 1. The cells of this matrix are not stochastically independent, which is 
relevant for the aggregation of these cells to a total operational risk exposure; to this end, the cell dependence structure is 
usually captured by copula techniques. In this paper, however, we skip this aggregational step and rather confine ourselves to 
cell-individual results. 

As in the Collective Model in actuarial context, in LDA, severity and frequency of the operational losses are modeled 
separately, and total loss is determined as the compound distribution; i.e., given the distribution of the frequencies N within 
time period t and distribution F of loss severities X,-, the aggregate or cumulative loss L over t is calculated as 

m 



In this context, annual frequency of losses are usually modeled by Poisson or the negative binomial distribution (IMoscadelli 
Assuming independent and identically distributed severities Xj 



(120041) -) 

distribution J2?(L,) given by its cumulative distribution function (cdf) 



F, L t is said to follow a compound process with 



F(L, <x) = J^W(N{t)=k)-F* k {x), keN 

k=0 

where F* k (x) is the cdf of the fc-fold convolution of F. The regulatory capital is determined applying a risk measure, 
e.g. Value-at-Risk (VaR), to Jz?(L f ). The operational Value-at-Risk (OpVaR a ) is the corresponding a quantile of J2?(L t ) as 
required by Basel II. Computation of this compound distribution J£{L t ) can be tackled by simulations, approximations, and 
other techniques — we do not detail this here. 

Generally, loss data can be fitted to a variety of severity distributions: medium-tailed Exponential, Lognormal, Gamma, 
Gumbel; heavy-tailed Pareto, GPD, Burr, Loggamma, Weibull with shape £, < 1. Ba sel II requirements s tress that a bank must 
be abl e to demonstrate that its approach captures 'tail' events (§667, Basel lJ ( 2006l) ). As discussed bv lDe Fontnouvelle et al. 

there is evidence for individual operational losses of banks which are heavy-tailed with e xisting first but infinite 
second moments; moreover, for pooled data even the first moment s may not exist (Moscadell ilJ2004K 

If the underlying severity distribution F is subexponentiafl iBocker and K liippelberg (2005) show the validity of the 
following first-order approximation for a high quantile of the compound distribution, the so-called single-loss approximation: 



OpVaR 



1- 



1 -a 
IF 



a 



(1) 



where A = E(N(t))/t is the expected frequency per unit of time f, and is the rate or intensity of the Poisson (point) process 
of loss events; F^ 1 is a corresponding quantile function. For Poisson distributed N(t), we note that the MLE for X is just 



A distribution F is subexponential iff F*"(x) « nF(x) 



• °°, where F*" is the survival function of an n-fold convolution of F. 



4 



the average number of losses over time period t. As in practice all commonly used heavy-tailed distributions belong to the 
subexponential class, it is usually enough to estimate the quantile of severity distribution of losses only. 



1.4 Mathematical Setup II: Generalized Pareto Distribution 

As we are interested in the tails of the severity distribu t ion, extreme value theo r y (EVT) applies, prov iding models for rare 

and extreme events, see lChavez-Demoulin et al. I d2006t) : limbrechts et all d2003b : [Neslehova et al.1 d2006l). 

One of the m ost prominent results of EVT, the Pickands-Balkema-de Haan Theorem (see Balkema and de Haanl ( 1974 ): 
Pickandsl (1975)), states that if the distribution of the standardized maxima of X tends to an extreme value distribution, the 
peaks over a high threshold u are asymptotically distributed as a generalized Pareto distribution (GPD) G u ^ p : 

P(X -u< x\X > u) w G u tp (jc), x > u . 

This gives rise to the so-called Peaks Over Threshold method (POT) and motivates the use of the GPD for modeling the 
tail of the severity distribution, provided threshold u is chosen appropriately. 

Limitations of this motivation are given by its asymptotic nature and its applying for extremal order statistics only. Hence, 
to obtain thresholds for which this motivation applies, one has to find a suitable trade-off between the lack of data beyond this 
high threshold and a large deviation from the asymptotic distribution. 



Parameters of the GPD The GPD is specified through its cdf: 



1 + 



-1/5 



X > u. 



In the GPD, the shape parameter | controls the form of the distribution; more specifically, only values ^ > are of 
interest in our context, as otherwise the support of this distribution will be bounded. /3 > is a scale parameter and u is 
a location parameter, which acts as threshold, usually unknown. Estimation of u is a difficult task, as standard methods 
from smooth parametric statistics do not apply. Several approaches, using cri teria such as minimum mea n prediction erro r 
(or robust variants) or minimum squared error, are aroun d though, compare iBeirlant et all d 19961 Il999h : fpupuis ( 19981) : 
Dupuis and Victoria-Feseri d2006l) : IVandewalle etail d2007l) . 

Note that the underlying distribution of X can be approximated as: 



F(x) 



l-G 



where n is a total sample size, N u is the number of exceedances over the threshold u, F(x) = 1 — F(x) is the survival 
function. Applying ((T), operational a-Value-at-Risk of a compound loss then is merely a corresponding a'-quantile of F and 
equal to 



OpVaR 



a 



1 



a 



a 



N u Xt 



Estimation of = (/3 , % ) for given threshold u in GPD models has widely been studi ed. A detailed analysis of existin g 
and new methods for the estimation of GPD — both classical and robust — can be found in Ruckdeschel and Horbenko d201dh . 
This also is the reference model for the remainder of this paper, i.e., for some given «6R 



= {G MiW j|j3>0,£>0}. 



(2) 



2 Robust Statistics 

Robustness is a stability notion. In robust statistics, it denotes stability with respect to deviations from th e distributiona l 
assumptions, most prominently caused by out liers. There is a vast body of l i teratu r e on this topic, starting w ith Huber 
and with excellent monographs given by, e.g.,|Huber (1981), H ampel et al.l ([1986), Rieder ( 1994). lMaronna et alJ d2006). 

In this section, we compile the necessary concepts and results from robust statistics needed to obtain the optimally-robust 
estimators used in this article. Most of this section holds for general (smooth) parametric models. The respective terms for 
model @, though, are spelt out in Section l2~3l 
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2.1 Robustness Concepts 

Mathematics has long been concerned with stability and offers a set of very fruitful concepts to operationalize it: continuity, 
differentiability, closeness to singularities. 

To make these available in our context, it helps to consider an estimator as a function of the underlying distribution. More 
precisely, we will consider functionals T mapping (a subset of all) distributions (at least all model distributions Fg) to the 
parameter set ©. If we plug in the true model distribution Fg it is natural to require that T(Fg) = 0, i.e., Fisher consistency. 
In this setting, the estimator is interpreted as T applied to the empirical distribution F„. 

As for growing sample size, in the classical setup, the empirical distribution will converge to the theoretical one according 
to the Glivenko-Cantelli, respectively Donsker Theorems ( van der Vaarti [l998. e.g. Thm's. 19.1, 19.3), we would expect that 



a "good" functional will respect this convergence in the sense that also T(F„) — > T(Fg) = suitably. 

In particular, weak convergence, as stated in Donsker's Theorem, respectively closeness in weak topology, help to formu- 
late interesting neighborhoods of distributions able to capture deviations of distributions or outlier phenomena. 

One of the most practicable types of such neighborhoods is the Gross Error Model: for a given, central distribution F and 
a radius £ £ [0, 1), we consider the set (or ball) of all distributions Q obtained as 

W = {Q\ Q = {l-e)F + eH} (3) 

where H is an unknown, uncontrollable, unpredictable outlier generating distribution. 

Continuity, more precisely eq ui-continuity, o f a functional on such neighborhoods (uniformly in growing sample size) is 
then called qualitative robustness (Hampel et al. , 1986, Sec. 2.2 Def. 3). 



In robust statistics one distinguishes between global and local robustness of an estimator. Local robustness asks how 
small deviations, in extreme case a single observation, influence the value of the esti mator. This is captured by the influence 
function IF — a functional derivativ^of the estimator defined as (Hamp el et al. ,[1986] Sec. 2.1 Def. 1): 



yr(x) := JF(x; T,F) = lim £ ^ - e)F + e8 x ) - T(F)j /e, (4) 

provided the limit exists and where 8 X denotes the Dirac measure inx. This influence function exactly gives us the infinitesimal 
influence of a single observation on the estimator. Under additional assumptions, one can read off the asymptotic variance of 
the estimator in the ideal model as the second moment of y/. Infinitesimally, i.e., for £->0, the maxi mal bias on % is just 
sup | \jf\, where | • | denotes Euclidean norm, sup | \ff\ is then also called gross error sensitivity (GES), dHampel et all Il986i 
(2.1.13)). An estimator is locally robust iff its GES is finite. 

Global robustness of the estimator describes the behavior of the estimator under massive distortions. It may be quantified 
by the breakdown point of the estimator — the maximal radius £ the estim ator can cope with without producing an arbitrary 
large bias; it comes with a functional and a finite sample notion, see (Hampel et al., 119861 Sec. 2.2 Def.'s 1,2) for formal 
definitions. Mathematically this is, hence, nothing but the closest singularity of the max-bias curve. 

Robust estimators are constructed to be both globally and locally robust. This stability comes at the cost of some efficiency 
in the ideal model: compared to classically optimal estimator, i.e., the MLE in most cases, robust estimators are less efficient 
as quantified by the asymptotic relative efficiency (ARE), i.e., the ratio of the respective two (traces of the) asymptotic 
(co)variances, which is strictly smaller than 1 as a rule, while a (maximal) value of 1 would indicate that we attain the same 
accuracy as the (classically) optimal estimators. Such an estimator would be called efficient. 

2.2 Robust Methods for Operational Risk 

As detailed in Subsect ion 11.21 data is an important issue in estimation of operational risk. This issue, by arguments as in 



Chernobai and Rache v (2006), can be approached by robust statistics. In particular this helps controlling the bias induced 



by outliers, censoring, and data heterogeneity, which can result in systematic over- or underestimation of operational risk. 
From a regulatory perspective, underestimation is to be avoided, while overestimation would not be equally harmful. A risk 
manager, on the other hand, also has to take into account opportunity costs when not investing available capital, so for him 
overestimation is also an issue. 

A common misunderstanding when applying robust estimation to extremes is that the extremes themselves are outliers. 
This need not be the case; in fact, outlier s are observations which ar e not f ollowing the general pattern of data, which 



is not necessarily connected to size. From iDell' Aquila and Embrec hts (2009), we retain three main messages concerning 



application of robust methods to extremes: 1) "Robust methods do not downweigh extreme observations if they conform 



2 Strictly speaking in mathematical terms, this is the Gateaux derivative of T into the direction of the tangent 8 X — F. To derive certain properties from 
this differentiability, in particular asymptotic normality, this notion is in fact too weak, and one has to apply stronger notions like Hadamard or Frechet 
differentiability; for details, see Fernholz 1 1983) or iRieder, 1994, Ch. 1). 
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to the majority of data." 2) "Robust methods can guarantee a stable efficiency, MSE, and a bounded bias over a whole 
neighborhood of the assumed distribution." 3) "Robust methods can identify influential poi nts in real data". 

A pplications of robus t statis ti cs to extreme va l ue dis tr ibutions can be found in, e.g., [ Field and Smith dl994:lDupuii 



dl998l): iDupuis and Field! dl998h : IPeng and Welsh! d200ll) : iDupuis and Morgenthalerj J2002I) : lJuarez and Schucanvl d200 



Brazauskas and Kleefeldfd2009h 



uis 
4); 



2.3 Optimally Robust Estimation — Applied to GPD 

To operationalize robust estimation, quality criteria are needed, which summarize the behavior of an estimator on a whole 
neighborhood, as in (O. In this context two canonical criteria for parameter estimators have emerged: maximal MSE 
(maxMSE) on some neighborhood a i/ around the ideal model and maximal bias (maxBias) on the respective neighborhood. 



Robust Optimality Problems This gives the following optimization problems: 
(Optl) minimize maxMSE on <W, (Opt2) 



minimize maxBias on 



The respective optimal estimators are called OMSE (Optimal MSE estimator) and MBRE (Most Bias /tobust Estimator), 
respectively. A variant (Optl') of (Optl) separates MSE into bias and variance and requires 

(Optl') minimize the variance in the ideal model subject to a uniform bias bound b on <fy 

giving OBRE (Optimally Bias /tobust Estimator^ as discussed in GPD context, e.g., in lDupuis and Field (1998). 



Remark Radius e and bias bound b can be seen as tuning parameters determining the degree of robustness. The larger e (smaller b) 
the more robust is the respective optimal procedure. The most frequently used tuning criterion though is the Anscombe criterion choosing 
b such that a prescribed ARE, typically 95%, is achieved in the ideal model. This criterion does not properly reflect the difficulty of the 
respective robustness problem, however. Instead, we propose a different criterion yielding estimator RMXE below. In particular, in the GPD 
model, for % = 0.7, with the Anscombe criterion, we may drop down to 14% relative efficiency for sufficiently large radius when compared 
to the OMSE, knowing this radius, whereas RMXE (also without knowing the radius) never drops below 68% in the same criterion. 



Shrinking neighborhoods For solving these problems, we note that, as a rule, bias and variance scale differently on 
neighborhoods of size e for growing sample size n: while variance usually is 0(l/«), maximal bias is O(e) (for robust 
estimators). So for growing n, with fixed neighborhood size e, bias will be dominant eve ntually in n, leading only to problem s 
of type (Opt2). T o balance bias and variance, the shrinking neighborhood approach (see lRiederl dl994). RuckdescheJ (2006), 
Kohl et alT ( 2010l) ) sets e = £„ = r/ ^/n for some initial radius r £ [0,°°). 



While in Subsection 12.11 we have started with a given procedure and then determined its influence function, in the 
shrinking neighborhood approach, optimality is assessed by determining optimal influence functions and, in a second step 
then estimators are constructed which have this optimal influence function ("uniformly on the shrinking neighborhood"). 

One has to admit that the justification of this approach is merely asymptotic, i.e., for large sample size. Whereas gen- 
eral statements for finite samples properties are out of reach, for given estimators these properties c an be assessed through 
simulations: in the simulation study carried out for the GPD case in Ruckdeschel and Horbenkold2010l) . the respective asymp- 
totically optimal estimators remained optimal (among the considered alternatives) down to sample size n = 40. 



ALEs The key concept behind this is asymptotically linear estimators (ALEs). In the simplest setting, we start with a 
smooth (L2-differentiable) parametric model 3? — {Pq . 6 0} for independent, identically distributed observations Xj ~ Pg 
with open parameter domain © C M. k , with scoreqj Aq and finite Fisher information .J?g = EgAgAg. In this setting, an 
influence function is any function y/g E Li (Pg) with Eg yg — and Eg y>gA x e = \ where \ is the ^-dimensional unit matrix. 
The set of all such influence functions is denoted by ^(O). Then a sequence of estimators S„ = S n (x\ ,x„) is an ALE if 

S„ = e + -f jW g(X i )+o P ,,(n- 1 / 2 ) (5) 

for some influence function yfg e ^2(0) (which is uniquely specified by (O). In the sequel we fix the true e © and suppress 
it from notation where unambigous. Note that the set of ALEs covers a huge variety of estimators, starting from MLEs, M- 
estimators, Z-estimators, L-estimators, R-estimators, quantiles, and many more; in fact, to derive asymptotic normality of an 
estimator, most frequently a representation like © is shown as an intermediate result. In particular, the MLE usually has 
influence function \f/ MLE = ^ _1 A. 

The terms OBRE and MBRE are taken from lHampel et alj <1986l) . while the notion OMSE is coined in Ruckdeschel and Horbenko 12010). 
4 Usually Ag is the logarithmic derivative of the density w.r.t. the parameter, i.e., Ag(x) = d/d9 log pg (x). 
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The GPD case: Model (0 is smooth, i.e., Z/z- differentiable, as the density fg is differentiable in 6 and the corresponding 



Fisher information is finite and continuous in 9 (Witting, 1985, Satz 1.194), with /^-derivative 



A«W = (^iog(i + W-^ T fe;-J + ^ T fe) , z = *f (6) 

and Fisher information J^a as 



2, >3 1 

(2| + 1)(| + 1) V J8 1 , j3- 2 (^ + l) 



(7) 



As JPq is positive definite for | > 0, j3 > 0, the model is (locally) identifiable. In particular, both coordinates of i//^ LE are 
unbounded, which implies that the MLE is not locally robust, as it has infinite GES. 

Optimal solutions ALEs are of particular interest as many of their asymptotic properties, can be obtained, even uniformly 
on neighborhoods, solely based on their influence functions. For instance, Problem (Optl') becomes 

minimize E | v/ 1 2 subject to sup|y/|<b, \\f £ ^2 (8) 

In particul ar, the influence functions corresponding to OMSE and MBRE, Iff and iff, respectively, are determined in 



(Rieder, 1994, Thm.'s 5.5.7 and 5.5.1) as solutions to the implicit equations 

Y = Ytsaa{l,b/\Y\}, Y=AA-a, r 2 b =B(Y -b)+ , (9) 
in case (Optl), and, similarly, for case (Opt2) by 

f = bY/\Y\, Y=AA-a, b = max{trA/E 171} . (10) 

u,A 

where tr(A) is the trace of A, ( • ) + = max( • , 0), and A 6 M. kxk , a £ R k , b > are Lagrange multipliers ensuring that \fr £ ¥2. 
Remark (i) Both Jjf and iff are built up from an affine transformation Y of the scores A. The term W/|y| retains the direction of Y but clips 
it to length b. For Jjf this clipping is done whenever the length of Y is larger than b, whereas in iff one always clips. 

(ii) The solution to (Optl') coincides with the one for (Optl), except that instead of the utmost right equation in (|9) to determine b, 
bias bound b is already fixed in advance in (Optl'). 

(iii) Both \jf and iff only can become if Y = 0, which means that contrary to practitioners' rules, in the optimally-robust influence 
functions, observations are not thrown away when they are "large" or when their influence measured by \Y | is large. At most, their influence 
gets clipped. 

(iv) Insisting on \ff £ 1*2 a l so ensures (asymptotic) unbiasedness in the ideal model, which is not true per se if, in a model of asymmetric 
distributions as the GPD, we simply skip the largest observations. As a rule such estimators obtained from skipping large order statistics 
will need a bias correction. 

One-step construction Having determined the optimally-robust influence functions, we still have to solve the already 
mentioned construction probl em, i.e., find a n estimator achieving these prescribed influence functions \j/ = Iff, iff. Several 



techniques are available, see (Rieder, 1994 Ch. 6); for simplicity, we apply the one-step construction: for some suitably 



robust and consistent starting estimator 9o, such an estimator is defined as 

S„ = o + -I>e b (X ! ) 
n . =1 

Then S n is an ALE with influence function l//. As a key feature, at least as long afl = K*, the breakdown point properties 
of the starting estimator 60 are inherited unchanged to S„. 

The starting estimator 60 in this construction is required to be sufficiently smooth and to be of accuracy Opg (1 / \/n), but 
not necessarily to be optimally accurate, which leaves us some choice. For computational efficiency, we would require 6q to 
be computationally fast and that it does not require an initialization itself. For global robustness of S„, we choose 6q to have 
highest possible breakdown point. 

In the GPD case, little is known about the highest attainable breakdow n point. According tolRuckdeschel and Horbenkol 



d201 II) . promising candidates for 9q are given by so-called LD-estimators (Marazzi and Ruffieux ( 1999)), which obtain their 

estimates for shape and scale by matching empirical /ocation and dispersion measures against their res pective model counter- 

parts. For this paper, we confine ourselves to the use of the particular LD-estimator MedkMAD which in lRuckdeschel and Horbenko 



5 For example, in case of scale parameter j3 in the GPD, restricted to (0,°°), this can be achieved by a logarithmic transformation of the parameter space. 
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(1201 11) has proven best among all considered candidates according to both computational efficiency and breakdown point. It 
is defined as follows: as location measure we use the median, whereas as dispersion, we use kMad, an asymmetric version of 
the MAD, defined as 

kMad = inf{s > : F(m + ks) - F(m - s) > -} 
Parameter k reflects the skewness of the distribution, and has to be tuned — for our purposes it suffices to take k = 10. 



Unknown radius As it is visible in ([§), OMSE requires the r adius r of th e neigh borhood to be known, which is almost 
never the case in practice. To this end, we apply a concept from Rieder et al. d2008l) : for any (arbitrarily fixed) radius s and 
fixed procedure OMSE s (optimal for radius s), we vary the true radius r and determine the maximal efficiency loss in terms of 
relative maxMSE in relation to the best procedure knowing the true radius r (i.e., OMSE, ) and then, in an outer loop minimize 
this maximal efficiency loss, varying s. This gives a least favorable radius s = rif f° r the neighborhood. The estimator optimal 
on the neighborhood of this radius r^, i.e., OMSE r;/ , is called radius -minimax estimator (RMXE) and is recommended. 



2.4 Software Implementation 



As general software environment, we use the open source software R, see lR Development Core Team! d201 II) . 

The solution of the implicit equations (O and ( TTOb involves numerical solution of fixed point equations as well as numer- 
ical integration to evaluate the expectati ons. A general object-orien ted framework for the implementation of these solutions 
can be found in R package RQptEst ( Kohl and Ruckdeschell ( 2oTTh ^>. T his package also covers RMXE . 

The implementation of kMad can be found in R package distrEx dKohl and Ruckdeschel ( 201 1 )). Similarly, the Med- 
kMAD estimator has been implemented in R by the second author; code is available upon request. 

In the GPD case, we encounter certain difficulties caused by the lack of (complete) equivariance. For computational 
efficiency, the respective Lagrange multipliers arising in MBRE, OMSE, and RMXE, therefore have been archived for a 
sufficiently dense grid of <!; -values, so that for arbitrary starting values of the shape coordinate of MedkMAD, the respective 
Lagrange multipliers needed to compute the one-step estimator can easily be obtained by interpolation. R-code is again 
available upon request. 



3 Data Set and Evaluation of the Optimally-Robust Procedures 

Of course, we are interested in applying these procedures to real data. For this purpose, we use the Algo DpData database 
from Algorithmics Inc. Algo OpData contains operational losses extracted from public data sources such as news media 
and the regulatory bodies. As of July 2010, the database includes more than 12,000 publicly reported operational risk losses 
from all industry sectors. These data have been collected in 1972-2010, majority of losses recorded within last 20 years. In 
particular, it provides detailed information about operational loss events over one million USD from 243 1 financial institutions 
in compliance with Basel II business line and event type definition. We use for calculations only data from the financial sector, 
which comprise 5462 losses over mostly 20 years, not adjusted for inflation. For pr actical app lication, the data should be 
scaled by an appropriate scaling method (BIS, 2010, §254) and adjusted for inflation (IBISLl2010L §191), but in this paper we 
use the data without scaling and inflation adjustment for illustration purposes. 

Since the data is collected from public sources, due to the thresholding/censoring mentioned in Subsection ll.2l the severity 
of losses is likely to be extremal (heavy-tailed). This makes the Algo OpData different from other external operational loss 
data stemming from, e.g., the ORX database. In that sense, it is appropriate to consider the losses unexpected — they can be 
used for scenario a nalyzes or to m odel the extreme tails of severity distributions. 

As required in lBasel III d2006l) . Algo OpData is structured as a matrix with nin^ columns with respective business lines 
(BL) of the institutions and seven rows representing the operational risk event types (ET) (see Table 1). Here, N is the total 
number of losses from / financial institutions over T years, ny denotes the number of losses for the (ET;, BLj) cell, and Ay- 
is its average per year for a single institution, so that the following holds: 

N = EE"'-> ' "»> = % n hi ' n ;J = L"'J ' hj = jfr , h* = jf- > Kj = jf- 

i j j i 

For brevity we demonstrate the estimation for one BL only, i.e., Asset Management. Taking the threshold of u = 1.6 
million USD (which gives 500 tail events) and applying the MedkMAD estimator (with k — 10) to datasets from Algo 
OpData, we get starting estimates for scale and shape. Performing a correction step with RMXE we get the final values for 
these parameters. For comparison, we calculate the maximum likelihood estimator (MLE) and the MBRE. The results of the 
estimation are presented in Table 2. 

Column 'Others' contains loss data from business lines other than the ones defined in lBaselDlt2003l . 
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BL 












As 


A Ayr 

AM 




Cr 




KB 


KuroK 


1 O 


Others 


nj.. 


n j,* Of. 




ET 


PPRP 
^ i r> i 

DPA 

EPWS 

EDPM 

EF 

IF 


51 

1 

4 
14 
16 


260 

11 

20 
48 
261 


171 

5 

20 
45 
287 
265 


172 
5 

18 

30 
43 


1 n 

46 
1 

14 
31 
45 


r\ 

y 
343 
4 
39 
94 
333 
517 


A 

4 
329 

53 
46 
25 
176 


A 

4 
273 
1 

23 
46 
18 
165 


r\ 

y 
570 

15 

61 
149 

54 
208 


30 
00 1 S 

26 
213 
436 
840 
1696 


1 C7 
L/C 

ZL1 <%n 

0% 
4% 
8% 
15% 
31% 


A DA 1 
U.UH-O 

0,001 
0.004 
0.009 
0.017 
0.035 




n.j 


86 


600 


793 


268 


147 


1339 


633 


530 


1066 


5462 








N ' /0 


2% 


11% 


15% 


5% 


3% 


25% 


12% 


10% 


20% 










A.,- 


0.002 


0.012 


0.016 


0.006 


0.003 


0.028 


0.013 


0.011 


0.022 









rows: 




columns. 




BDSF 


Business Disruption and System Failures 


AS 


Agency Services 


CPBP 


Clients Products and Business Practices 


AM 


Asset Management 


DPA 


Damage to Physical Assets 


CB 


Commercial Banking 


EPWS 


Employment Practices and Workplace Safety 


CF 


Corporate Finance 


EDPM 


Execution Delivery and Process Management 


PS 


Payment and Settlement 


EF 


External Fraud 


RB 


Retail Banking 


IF 


Internal Fraud 


Rbrok 


Retail Brokerage 






TS 


Trading & Sales 



Table 1 : Algo OpData — the operational risk data structured by business lines and events types according to the Basel II requirements. 



As indicated in Subsection l2.4l the implementation of influence functions of RMXE and MBRE is taken from R package 
ROptEst and e nhanced by cod e of the second author, who also provides the code for MedkMAD, while MLE is taken from 



R package POT feibatetl (2009)) 



The VaR calculated with MLE is the smallest, the one calculated by MBRE 
is the largest. Since the actual quantile is unknown, we cannot judge their 
quality without looking the diagnostic plots given in Section |H 



A =0.012 


Estimator 


JS/15 




OpVaR/15 


MedkMAD 


0.98 


1.47 


25.36 


MLE 


1.04 


1.28 


18.79 


RMXE 


1.01 


1.43 


24.11 


MBRE 


0.98 


1.52 


27.74 



From both theory and simulational results of iRuckdeschel and Horbenko 
( 2010h . it follows though that in ideal situations, MLE is optimal, whereas in 
the presence of only minor contamination MLE becomes unreliable, in which 
situation then OMSE and RMXE clearly are the best choices. 

This means, adding to the data single, extremely large or small losses, Table 2: Estimates of scale, shape of GPD, and 
would change the Op VaR value, obtained by MLE considerably, even if this l-year-OpVaRgg^ in millions USD at average num- 
added loss is of no relevance, whereas the value obtained through RMXE and ber of losses per year A for AM BL. 
MBRE would only slightly change. On the other side, in general, we have no means to decide for sure whether a certain 
extreme loss is an outlier, so this loss should have influence on the calculation of risk. As mentioned, our optimal estimators 
have this property: every observation counts, i.e., each observation does exert a certain, albeit bounded influence on the 
estimation. 



4 Diagnostic Plots 

Diagnostic plots in robust statistics aim at analyzing data for possible outliers and their influence on the underlying estimator. 
We have looked at the following diagnostic plots: influence function plots, outlyingness plots, and QQ plots with robust 
confidence bands. They should help practitioners to better understand the robust methods when applying them to real data. 
All these diagnostics are available in the R package ROptEst. 

4.1 Influence Function Plots 

The influence function quantifies the (infinitesimal) influence of each data point on the estimator. If the influence function of 
an estimator is unbounded, so is the GES (see Figure [2(a)) , and single outliers can cause the respective estimator to produce 
heavily biased estimates. Robust estimators have bounded influence functions (e.g., RMXE in Figure |2(b)] >. 

As we estimate jointly shape and scale of GPD, the influence function has two coordinates called influence curves, i.e., 
IC = (IC^ICp). On top of the lines representing the curves themselves, we have plotted the actual observations marked as 
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(a) Influence function of MLE 



(b) Influence function of RMXE 



Figure 2: Maximum likelihood and radius-minimax influence functions. On the jr-axis the values of the observations are plotted, on the y-axis the 
respective value of the influence functions for scale and shape parameter. The influence function for the scale parameter, IC^ , is scaled to j3 equal to one. 



filled circles. The saturation of the points at the bottom of the graph reflects the concentration of the observations, and the 
radius of the points represents the size of their (joint) influence on % , /3 in terms of | IC | . 

A positive [negative] value of a coordinate of the influence function at a certain observation indicates that, infinitesimally, 
this observation has increased [decreased] the respective value of the respective parameter coordinate. Sometimes this helps 
in identifying the observation(s) which has/have caused a high or low value of the parameter estimate. Also a disequilibrium 
of positive and negative values in a coordinate would be boldly visible. Without loss of generality, assume we have much 
more observations with positive value in one coordinate of the influence function, then, as the influence function must be 
centered, this can only happen, if there are at least some observations with a considerably negative influence. 

As visible in the graphs, RMXE smoothly distributes the influence of the observations, with no outstandingly influential 
observations (due to boundedness). In contrast, by design, MLE cannot take into account outliers, so considers large obser- 
vations as highly informative for parameter | , thereby attributing high influence to some few observations at the very right of 
the plot of IC^. 



4.2 Outlyingness Plot 

Outlyingness plots help to detect outliers, i.e., observati ons which deviate i n some extent from the majority of data. 



The plots discussed here translate ideas discussed in Hubert et al. (2005) to our GPD case; this case is not covered by the 



cited reference, as the model does neither fall into the scope of (multivariate) location-scale type models nor is it a regression 
model. Still, we follow the authors in the following two-step procedure: 

In a first step, model parameters and covariances are estimated from the data by robust techniques. In the presence of out- 
liers, classical estimators are prone to masking effects: some few large outliers may distort our quantification of outlyingness 
such that other (smaller) outliers no longer are identifiable; similarly, but less harmful in most cases, some "clean data" may 
look like outliers in the (distorted) perspective of the outlyingness measure, an effect called swamping. Robust procedures 
avoid both effects to large extent. 

In a second step, for outlier detection, we apply an unbounded criterion to the data, e.g. the quadratic form defining 
the Mahalanobis norm. This unboundedness helps to discern outliers properly, which in a bounded criterion would become 
indistinguishable from non-outliers. However, where model parameters and covariances are needed to evaluate this criterion, 
e.g. the covariance to determine the Mahalanobis norm, we use the robust ones from the first step. 

Usually to visualize outlyingness, two criteria from the second step are used in parallel — one of for the x- one for the 
y-axis. In each coordinate a threshold (preferably a suitable high quantile) is chosen, giving a partition into four quadrants. 
Observations simultaneously falling beyond both thresholds are flagged as outliers, which, of course, must be seen as only an 
indication for being an outlier, as both usual error-types of a test may occur. 

There are different variations of outlying plots: distance-distance, distance-projection, and projection-projection plots. 
Our outlyingness plot for the GPD is a distance-projection plot, whi ch for parameter e stimation uses RMXE and for covari- 



anc es the Mi nimum Covariance Determinant (MCD) estimator from lRousseeuwl(ll984l) . as implemented in R package rrcov 



see iTodorovl d2009l) . More precisely, we plot a (robustified, empirical) Mahalanobis distance of the MLE influence function 
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Figure 3: Diagnostic plots: outlyingness plot and QQ plot with robust confidence bands 



against the usual data quantiles. This gives Figure [3(a)| We use thresholds given by the 99% quantile of the ^-distribution 
with non-centrality on the y-axis and 99% quantile of the data on the x-axis. 

Table |4] shows which operational losses in the Asset Management BL are flagged as outliers in Figure [3(a)| Of the six 
outlying losses, indexed as 246, 318, 320, 321, 322, 444, four are caused by the recent fraud by B. L. Madoff and the 
remaining one resulted from a Ponzi scheme fraud. Although these losses are probably outliers, they should be included into 
the estimation instead of being skipped, as they could also carry some valuable information for future losses. Classical MLE 
however interprets these values as "usual observations" and, as a consequence, assigns them too much influence, no matter 
whether their relevance or reproducibility is doubtful or not. Robust RMXE, includes these doubtful observations too, but 
downweighs them, so that their influence on the resulting estimates is smaller than those of the remaining losses (see Table[3]l. 

4.3 QQ Plot With Robust Confidence Bands 

Quantile-quantile (QQ) plots aim at visualizing the quality of a model fit: empirical quantiles of the observations are plotted 
against the quantiles of the fitted model distribution. A concentration of the plotted points around the line y = x indicates a 
high quality, while large deviations indicate outliers or a failure of the model fit. 

Still, there is estimation uncertainty in the data, which can be captured by suitable confidence intervals grouped to bands 
according to their position, larger [narrower] bands indicating higher [lower] uncertainty. 

As usual in this context, there are both pointwise and simultaneous confidence bands. Pointwise confidence intervals 
describe the stochastic variability of the empirical distributions of the data for each quantile individually, while simultaneous 
confidence bands capture the variability of the whole empirical cumulative distribution function (ecdf), so that, on average, 
95% of the graphs produced by ecdfs will completely lie within these bounds. 

Taking outlier-induced model deviations into account, for robust confidence bands the nominal confidence level has to be 
adjusted accordingly: to warrant a nominal level a we have to increase the defining level to a + r/y/n. 

The QQ plot of RMXE-estimated GPD quantiles versus real quantiles is depicted in Figure [3(b)] The size of the points 
reflects their weight in the influence function, so that downweighed observations get smaller circles. One can see that the fit 
is good in the lower and middle quantiles where (at least in the middle) also model uncertainty is low, but poorer in the upper 
ones around 4, where the points even fall outside the (simultaneous) confidence bands. This phenomenon appears to be due 
to the outlying data points in the tails (that at least get downweighed by RMXE). The widening of the confidence bands at the 
lower and upper ends is common and caused by the little empirical evidence available in this area. 



Obs. 
Index 


Loss Value 
(billions^USD ) 


Weight 


Obs. 
Index 


Loss Value 
(billions USD) 


Weight 


Obs. 
Index 


Loss Value 
(billions USD) 


Weight 


246 
318 


6.0 
65.0 


0.18 
0.11 


320 
321 


2.4 
7.2 


0.24 
0.17 


322 
444 


3.3 
4.0 


0.21 
0.20 



Table 3 : Weights of outliers in RMXE with corresponding loss values 
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Outlier 


Business Line 


Event Type 


Organization 


Loss Amount 

f'hillii-wiO T T^H^ 
^UllllUllJi UjL/j 


Settlement 


Location 


246 


Asset Management 


Clients Products 

and Business Practices 


Amaranth Advisors 


6.0 


9/18/2006 


North America 

Canada 

Alberta 


318 


Asset Management 


Internal Fraud 


Bernard Madoff Investment 
Services LLC 


65.0 


12/1 1/2008 


North America 
United States 
New York 


320 


Asset Management 


External Fraud 


Ascot Partners L.P. 


2.4 


12/16/2008 


North America 
United States 
New York 


321 


Asset Management 


External Fraud 


Fairfield Greenwich Group 


7.2 


12/15/2008 


North America 
United States 
Connecticut 


322 


Asset Management 


External Fraud 


MassMutual Financial Group 


3.3 


12/16/2008 


North America 
United States 
New York 


444 


Asset Management 


Internal Fraud 


Cash Plus 


4.0 


10/9/2009 


Caribbean 
Jamaica 



Table 4: Outlying events in Asset Management business line 



Conclusion 

This article applies optimally-robust estimation techniques to real world data for the calculation of the regulatory capital for 
operational risks within the LDA (AMA) setting, according to Basel II requirements. The data we use is taken from the Algo 
OpData database of Algorithmics Inc. No scaling has been applied, so the results we obtain are only meant for illustrative 
purposes. 

Still, all other steps required in LDA have been gone through: we model the severity of tail events by a GPD distribution 
and the frequency of losses with a Poisson distribution, and apply a single-loss approximation for the corresponding 99.9% 
quantile of the compound loss distribution. For estimation of the GPD parameters, we f ocus on respective optimally - robus t 
estimators, OMSE, OBRE, and RMXE, in their specialization to the GPD case taken from lRuckdeschel and Horbenkol ( 201oh 



where they are also compared with several competitors but as predicted by theory turn out optimal even at sample sizes down 
to 40. For these estimators, we use a robust starting estimator, MedkMAD, based on the median and the asymmetric median 
of absolute deviations. Its qualific ation as globally robust, computationally efficient starting estimator has been taken from 



Ruckdeschel and Horbenko (201 1) 



In evaluating our estimators we have found no difficulties. In case of business line Asset Management, our robust esti- 
mators indicate the need of a higher regulatory capital than indicated by classical MLE (28% higher for RMXE), and a value 
of 28% for the relative deviation indicates the presence of influential outliers. A statement of the type "robustly estimated 
OpVaR is generally higher than the one obtained by classical methods" however is not true. The order varies from business 
line to business line. 

To assess the quality of our robust estimates and the respective model fit at real data, and to discern potential outliers, we 
present robust diagnostic plots. At the present data set, our outlyingness plot was able to grasp the singular pattern of the 
Madoff fraud. For the majority of the data, however, the robust model fit according to the QQ plot seems reasonably good. In 
the influence function plot, we see that at the actual data, in particular the shape parameter is concerned with highly influential 
observations in the MLE case, whereas no such pattern is visible in the RMXE case. 

For the evaluation of the respective estimators, as well as for the diagnostic plots, we use publicly available software 
provided in the R package RDptEst, tuned for computational efficiency with own code, as well as own routines for the 
computation of MedkMAD; the code is available upon request. 
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